Offload intrinsic #147936

Sa4dUs · 2025-10-21T07:50:39Z

This PR implements the minimal mechanisms required to run a small subset of arbitrary offload kernels without relying on hardcoded names or metadata.

offload(kernel, (..args)): an intrinsic that generates the necessary host-side LLVM-IR code.
rustc_offload_kernel: a builtin attribute that marks device kernels to be handled appropriately.

Example usage (pseudocode):

fn kernel(x: *mut [f64; 128]) {
    core::intrinsics::offload(kernel_1, (x,))
}

#[cfg(target_os = "linux")]
extern "C" {
    pub fn kernel_1(array_b: *mut [f64; 128]);
}

#[cfg(not(target_os = "linux"))]
#[rustc_offload_kernel]
extern "gpu-kernel" fn kernel_1(x: *mut [f64; 128]) {
    unsafe { (*x)[0] = 21.0 };
}

ZuseZ4 · 2025-10-24T00:31:36Z

compiler/rustc_middle/src/ty/offload_meta.rs

+    }
+
+    pub fn from_ty<'tcx>(tcx: TyCtxt<'tcx>, ty: Ty<'tcx>) -> Self {
+        OffloadMetadata { payload_size: get_payload_size(tcx, ty), mode: TransferKind::Both }


If you already have the code here, I would add a small check for & or byVal (implies Mode ToGPU), vs &mut (implies Both).

In the future we would hope to analyze the & or byval case more, if we never read from it (before writing) then we could use a new mode 4, which allocates directly on the gpu.

bors · 2025-11-05T11:53:48Z

☔ The latest upstream changes (presumably #148507) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2025-11-09T13:03:34Z

☔ The latest upstream changes (presumably #148721) made this pull request unmergeable. Please resolve the merge conflicts.

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

compiler/rustc_codegen_llvm/src/intrinsic.rs

compiler/rustc_middle/src/ty/offload_meta.rs

rustbot · 2025-11-16T10:27:24Z

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

library/core/src/intrinsics/mod.rs

RalfJung · 2025-11-16T14:23:49Z

library/core/src/intrinsics/mod.rs

 #[rustc_intrinsic]
 pub const fn autodiff<F, G, T: crate::marker::Tuple, R>(f: F, df: G, args: T) -> R;

+/// Generates the LLVM body of a wrapper function to offload a kernel `f`.


We have other backends besides LLVM, so intrinsics typically should be described in terms of what they do, not implementation details. Is that possible here?

this intrinsic only makes sense in LLVM right now because it relies directly on LLVM's offload feature. that's why i wanted to specify the backend

if there's a better way to proceed, please let me know

Hm, seems like we did something similar for the autodiff intrinsic. I would assume that the concept of offloading is independent of LLVM, but maybe we don't have to figure out that full story at this point.

Are there docs for the LLVM offload feature you could link to?

i'd say https://clang.llvm.org/docs/OffloadingDesign.html contains all the relevant details

ping @ZuseZ4 in case he has something better

Yes, your link is probably the best overview. Offload grew out of OpenMP, which is also supported by other compilers like GCC. LLVM just put in some effort to split Offloading and OpenMP, so that the former is easier to use independently. https://gcc.gnu.org/projects/gomp/

ping @antoyo just for awareness.

With respect to a high-level explanation of this intrinsic:
We use a single-source, two-pass compilation approach. We compile all functions that should be offloaded for the device (e.g nvptx64, amdgcn-amd-amdhsa, intel in the future) and which are marked by our intrinsic. We then compile the code for the host (e.g. x86-64), where most of the offloading logic happens. On the host side, we generate calls to the openmp offload runtime, to inform it about the layout of the types (a simplified version of the autodiff TypeTrees). We also use the type system to figure out whether kernel arguments have to be moved only to the device (e.g. &[f32;1024]), from the device, or both (e.g. &mut [f64]). We then launched the kernel, after which we inform the runtime to end this environment and move data back (as far as needed).

There are obviously a lot of features and optimizations which we want to add in the future. The Rust frontend currently also mostly uses the OpenMP API, since it was more stable back when I started working on it. We intend to move over to the newer offload API, which is slightly lower level.

We use a single-source, two-pass compilation approach. We compile all functions that should be offloaded for the device (e.g nvptx64, amdgcn-amd-amdhsa, intel in the future) and which are marked by our intrinsic. We then compile the code for the host (e.g. x86-64), where most of the offloading logic happens. On the host side, we generate calls to the openmp offload runtime, to inform it about the layout of the types (a simplified version of the autodiff TypeTrees). We also use the type system to figure out whether kernel arguments have to be moved only to the device (e.g. &[f32;1024]), from the device, or both (e.g. &mut [f64]). We then launched the kernel, after which we inform the runtime to end this environment and move data back (as far as needed).

That also seems worth documenting somewhere, though I am not entirely sure where. And the type system interactions here definitely need to be discussed with t-opsem and probably other teams before anything in this area gets stabilized.

ZuseZ4 · 2025-11-17T05:33:27Z

I'll do another round later, but can you also update https://rustc-dev-guide.rust-lang.org/offload/usage.html#usage,
to show how end-to-end testing now works? The dev guide is also a submodule of rustc, so you can change it as part of this PR where needed.

bors · 2025-11-17T12:14:31Z

☔ The latest upstream changes (presumably #149013) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot · 2025-11-17T17:40:34Z

The rustc-dev-guide subtree was changed. If this PR only touches the dev guide consider submitting a PR directly to rust-lang/rustc-dev-guide otherwise thank you for updating the dev guide with your changes.

cc @BoxyUwU, @jieyouxu, @Kobzol, @tshepang

rustbot · 2025-11-17T17:40:36Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

ZuseZ4

Oh, I wanted to submit these yesterday.

View changes since this review

ZuseZ4 · 2025-11-17T05:46:08Z

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

    // Step 2)
    let s_ident_t = generate_at_one(&cx);
-    let o = memtransfer_types[0];
+    let o = memtransfer_types;


just use memtransfer_types directly

ZuseZ4 · 2025-11-17T06:14:28Z

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

+    metadata: &[OffloadMetadata],
+    types: &[&Type],
+    symbol: &str,
+) -> (&'ll llvm::Value, &'ll llvm::Value, &'ll llvm::Value, &'ll llvm::Value) {


You seem to pass those 4 around together, and it's a bit hard to tell from the function signature what you're returning. Can you create a small struct so we can return that instead, and add a one-sentence documentation to it? It looks like we need these four per kernel that we want to handle.

ZuseZ4 · 2025-11-17T06:21:55Z

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

-    let mut builder = SBuilder::build(cx, kernel_call_bb);
-
-    let types = cx.func_params_types(cx.get_type_of_global(called));
+    let mut builder = SBuilder::build(cx, bb);


Can you add a FIXME, so we can get rid of this? I don't think it should be a permanent solution. I'm also somewhat confused about why they are unused.

i had a similar issue with autodiff, i think that, as the intrinsic is lowered relatively early in the compilation pipeline, it goes through more LLVM opt passes, and since there isn't yet any info that they will be used by the offloading feature, LLVM internalizes them (i tried to prevent them from being optimized by changing the linkage, but they always appeared as internal) and then removes them because unused internal variables

i understand that in the first version of codegen, when this is done in fat LTO, LLVM is already aware of what will actually happen and doesn't modify them

We launch all LLVM passes at once via an LLVM PassManager, so it shouldn't change. But as long as it works, we can postpone investigations till later, the PR is enough of an improvement.

ZuseZ4 · 2025-11-17T06:24:17Z

compiler/rustc_codegen_llvm/src/intrinsic.rs

+            &target_symbol,
+        );
+
+    let bb = unsafe { llvm::LLVMGetInsertBlock(bx.llbuilder) };


It seems like you get a bb from a builder, just to then create a builder out of the bb inside of gen_call_handling, right? Can you directly pass the builder?

ZuseZ4 · 2025-11-17T06:28:26Z

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

    // Step 0)
    // %struct.__tgt_bin_desc = type { i32, ptr, ptr, ptr }
    // %6 = alloca %struct.__tgt_bin_desc, align 8
-    unsafe { llvm::LLVMRustPositionBuilderPastAllocas(builder.llbuilder, main_fn) };


Now that you create / reuse a builder, are you sure that the tgt_bin_desc would get an alloca in the right position in the first bb (and it nost just working in the test by coincidence)?

E.g.

fn main() { if (condition) { } else { } core::intrinsic::offload(args); }

If the builder is on the intrinsic, the alloca wouldn't land where it should (in the beginning).

the idea (or at least how i'd imagined it) is that when expanding the future macro, the wrapper function should always contain only the intrinsic, so we can generate all the logic sequentially

if you mean that it needs to be at the beginning of the first bb of the program, just let me know and i'll change that

Yes, allocas should all be together at the beginning, so moving the builder via LLVMRustPositionBuilderPastAllocas (and putting the builder back into the old place) would be the way to go.
It might work if you put them elsewhere, but LLVM opt passes don't really expect that, so we're likely to miss out on some optimizations.

I agree that we should later distinguish better between the kernel launch intrinsic and the globals that are somewhat independent of the number of kernel launches.

ZuseZ4 · 2025-11-17T06:28:59Z

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

    let a5 = builder.direct_alloca(tgt_kernel_decl, Align::EIGHT, "kernel_args");

    // Step 1)
-    unsafe { llvm::LLVMRustPositionBefore(builder.llbuilder, kernel_call) };


Same question for the position of the memset without repositioning the builder.

ZuseZ4 · 2025-11-17T06:30:41Z

compiler/rustc_codegen_llvm/src/intrinsic.rs

    );
 }

+fn codegen_offload<'ll, 'tcx>(


Can you add some docs to this function?

bors · 2025-11-18T15:17:18Z

☔ The latest upstream changes (presumably #148151) made this pull request unmergeable. Please resolve the merge conflicts.

ZuseZ4 self-assigned this Oct 21, 2025

Sa4dUs force-pushed the offload-intrinsic branch from 9118683 to 23722aa Compare October 21, 2025 19:45

This comment has been minimized.

Sign in to view

ZuseZ4 added the F-gpu_offload `#![feature(gpu_offload)]` label Oct 22, 2025

ZuseZ4 reviewed Oct 24, 2025

View reviewed changes

ZuseZ4 mentioned this pull request Oct 24, 2025

Tracking Issue for GPU-offload #131513

Open

5 tasks

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch from e0fd7be to 97a8e96 Compare November 7, 2025 15:37

This comment has been minimized.

Sign in to view

rustbot added the A-attributes Area: Attributes (`#[…]`, `#![…]`) label Nov 11, 2025

ZuseZ4 reviewed Nov 14, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs Outdated Show resolved Hide resolved

ZuseZ4 reviewed Nov 14, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/intrinsic.rs Outdated Show resolved Hide resolved

ZuseZ4 reviewed Nov 14, 2025

View reviewed changes

compiler/rustc_middle/src/ty/offload_meta.rs Outdated Show resolved Hide resolved

Sa4dUs force-pushed the offload-intrinsic branch from 3540edb to e9d89ce Compare November 14, 2025 22:33

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch from e9d89ce to a08949b Compare November 15, 2025 09:49

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch from a08949b to 9397d31 Compare November 15, 2025 11:24

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch from 9397d31 to 7666b58 Compare November 16, 2025 09:10

Sa4dUs marked this pull request as ready for review November 16, 2025 10:27

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 16, 2025

RalfJung reviewed Nov 16, 2025

View reviewed changes

library/core/src/intrinsics/mod.rs Show resolved Hide resolved

Sa4dUs commented Nov 16, 2025

View reviewed changes

library/core/src/intrinsics/mod.rs Show resolved Hide resolved

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch from e38248d to b34be91 Compare November 16, 2025 12:31

RalfJung reviewed Nov 16, 2025

View reviewed changes

Sa4dUs added 14 commits November 17, 2025 18:08

first definition of offload intrinsic (dirty code)

892aaa3

Add basic offload metadata

0ad8f43

Set maptypes using offload metadata

bb5620a

Pass frontend info to gen_call_handling

81c4bb2

Mark globals as used + some minor fixes

89fd335

Get types from fn_sig

29a3eac

Don't depend on outer fn and some cleanup

75fdf41

Add string attr to apply extra ptr arg to offload kernels

3dcb78a

Prevent globals from being optimized without relying on llvm.used

0029424

Add mapping bitflags and general cleanup

68a7a9f

Iterate over params and minor fixes

9c9aac0

Emit erros when invalid config

d43655b

Add intrinsic doc comment

a4c3e6c

Update rustc-dev-guide

0cee58b

Sa4dUs force-pushed the offload-intrinsic branch from b34be91 to 0cee58b Compare November 17, 2025 17:40

rustbot added the A-rustc-dev-guide Area: rustc-dev-guide label Nov 17, 2025

ZuseZ4 reviewed Nov 17, 2025

View reviewed changes

Offload intrinsic #147936

Are you sure you want to change the base?

Offload intrinsic #147936

Uh oh!

Conversation

Sa4dUs commented Oct 21, 2025 • edited by ZuseZ4 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

bors commented Nov 5, 2025

Uh oh!

This comment has been minimized.

bors commented Nov 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rustbot commented Nov 16, 2025

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZuseZ4 Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZuseZ4 commented Nov 17, 2025

Uh oh!

bors commented Nov 17, 2025

Uh oh!

rustbot commented Nov 17, 2025

Uh oh!

rustbot commented Nov 17, 2025

Uh oh!

ZuseZ4 left a comment • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sa4dUs Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sa4dUs commented Oct 21, 2025 •

edited by ZuseZ4

Loading

RalfJung Nov 16, 2025 •

edited

Loading

ZuseZ4 Nov 16, 2025 •

edited

Loading

ZuseZ4 left a comment •

edited by rustbot

Loading

Sa4dUs Nov 18, 2025 •

edited

Loading